Skip to main content

Correlation Coefficient

correcoef(A: any[] | Mat, B: any[] | Mat): Number

param A - the first part of the pair of data to find the correlation coefficient of. Needs to be a JS array or Mat.

param B - similar to A, needs to be the same length and a JS array or Mat holding data.

returns - a number representing the correlation coefficient of A and B, between -1 and 1.

If A and B are not 1-d JS array, then they will be flattened to 1-d.

The correlation coefficient can be thought of as a number that represents how strongly 2 variables are connected. It takes in two variables of data, A and B, where A contains data, say [0,1,1,2,2,3] and B contains data, say [0,0,1,1,2,3,]. It then effectively finds the distances between each point for each indice and keeps a track of it as a sum.

Overall, it outputs a number between -1 and 1. The closer it is to 1, the stronger the correlation. The closer to -1, the stronger the inverse correlation.

A practical use of this is something like an array of housing prices over the years - A and an array of crime amount over the years - B. Is crime correlated to housing prices? If so, how much is it correlated?

To find the correlation coefficient of the two inputs, use the following formula:

$$r = \frac{\sum(x_i - \bar{x}) (y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2 \sum(y_i - \bar{y})^2}}$$